Estimating Grammar Parameters Using Bounded Memory

نویسندگان

  • Tim Oates
  • Brent Heeringa
چکیده

Estimating the parameters of stochastic context-free grammars (SCFGs) from data is an important, well-studied problem. Almost without exception, existing approaches make repeated passes over the training data. The memory requirements of such algorithms are illsuited for embedded agents exposed to large amounts of training data over long periods of time. We present a novel algorithm, called HOLA, for estimating the parameters of SCFGs that computes summary statistics for each string as it is observed and then discards the string. The memory used by HOLA is bounded by the size of the grammar, not by the amount of training data. Empirical results show that HOLA performs as well as the Inside-Outside algorithm on a variety of standard problems, despite the fact that it has access to much less information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating Grammar Parameters using Bounded Memory

Estimating the parameters of stochastic context-free grammars (SCFGs) from data (i.e., strings) is an important, well-studied problem. Almost without exception, existing approaches make repeated passes over the training data. The memory requirements of such algorithms are ill-suited for embedded agents exposed to large amounts of training data over long periods of time. We present a novel algor...

متن کامل

Parallel Parsing of Languages Generated by Ambiguous Bounded Context Grammars

Using the CRCW PRAM model, we describe a language recognition algorithm for an arbitrary grammar in the class of BCPP grammars 9]. (BCPP grammars, which admit ambiguity, are a generalization of both the NTS grammars 14] and Floyd's bounded context (BC) grammars 4].) Using n processors, the algorithm runs in time O(h log n) (O(h) in the case of an unambiguous grammar), where n is the length of t...

متن کامل

Incremental Parsing in Bounded Memory

This tutorial will describe the use of a factored probabilistic sequence model for parsing speech and text using a bounded store of three to four incomplete constituents over time, in line with recent estimates of human shortterm working memory capacity. This formulation uses a grammar transform to minimize memory usage during parsing. Incremental operations on incomplete constituents in this t...

متن کامل

Memory-Bounded Left-Corner Unsupervised Grammar Induction on Child-Directed Input

This paper presents a new memory-bounded left-corner parsing model for unsupervised raw-text syntax induction, using unsupervised hierarchical hidden Markov models (UHHMM). We deploy this algorithm to shed light on the extent to which human language learners can discover hierarchical syntax through distributional statistics alone, by modeling two widely-accepted features of human language acqui...

متن کامل

A Bounded Rationality Model of Information Search and Choice in Preference Measurement

It is becoming increasingly easier for researchers and practitioners to collect eye-tracking data during online preference measurement tasks. The authors develop a dynamic discrete choice model of information search and choice under bounded rationality, which they calibrate using a combination of eye-tracking and choice data. Their model extends Gabaix et al.’s (2006) directed cognition model b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002